Aiding Web Searches by Statistical Classification Tools

نویسندگان

  • Gerhard Heyer
  • Uwe Quasthoff
  • Christian Wolff
چکیده

We describe an infrastructure for the collection and management of large amounts of text, and discuss the possibility of information extraction and visualisation from text corpora with statistical methods. The paper gives an overview of processing steps, the contents of our text databases as well as different query facilities. Our focus is on the extraction and visualisation of collocations and their usage for aiding web searches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TCDB: the Transporter Classification Database for membrane transport protein analyses and information

The Transporter Classification Database (TCDB) is a web accessible, curated, relational database containing sequence, classification, structural, functional and evolutionary information about transport systems from a variety of living organisms. TCDB is a curated repository for factual information compiled from >10,000 references, encompassing approximately 3000 representative transporters and ...

متن کامل

GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining

Statistical and clustering analyses of gene expression results from high-density microarray experiments produce lists of hundreds of genes regulated differentially, or with particular expression profiles, in the conditions under study. Independent of the microarray platforms and analysis methods used, these lists must be biologically interpreted to gain a better knowledge of the patho-physiolog...

متن کامل

The Visualization of Evolving Searches

It is a common misconception that all web searches can be answered with a single query. It is true that when users have a clear idea of what they are searching for, they can specify an accurate and efficient query to the search engine and find pertinent results in the first 10 search results returned. However, studies of search engine usage by Jansen et al. ( [56],[57], [59]) show that, on aver...

متن کامل

Familiarity with and Use of Web 2.0 Tools in Library Services by Librarians Working at Iran, Tehran, and Shahid Beheshti Universities of Medical Sciences

Background and Aim: Web 2.0 technology has various usages in libraries all over the world. According to studies, however, it seems that this technology is rarely used in Iranian academic libraries. Therefore, the present study aims to determine the level of familiarity with and use of Web 2.0 tools among librarians working at Iran, Tehran, and Shahid Beheshti Universities of Medical Sciences. ...

متن کامل

Using unlabeled data to improve classification in the naive bayes approach: Application to web searches

This paper introduces a method to build a classifier based on labeled and unlabeled data. We set up the EM algorithm steps for the particular case of the naive Bayes approach and show empirical work for the restricted web page database. Original contributions includes the application of the EM algorithm to simulated data in order to see the behavior of the algorithm for different numbers of lab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000